-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid blocking in BigQuery split and page sources #23741
Avoid blocking in BigQuery split and page sources #23741
Conversation
@ForBigQuery | ||
public ExecutorService provideExecutor(CatalogName catalogName) | ||
{ | ||
return newCachedThreadPool(daemonThreadsNamed("bigquery-" + catalogName + "-%s")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should have an upper bound on the number of threads here, but I wasn't sure if we can come up with a constant value or should it be configurable.
There's also another executor right above, but I didn't want to mix them up. The naming can be improved here for sure, but I also didn't have good ideas.
/test-with-secrets sha=f76016e22128a175fbf1e08a11efdac748f27ab6 |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/11337589498 |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/LazySplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/LazySplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageAvroPageSource.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Outdated
Show resolved
Hide resolved
2e4ef46
to
d3c123a
Compare
@ebyhr please run the tests again, I ran them locally too. |
/test-with-secrets sha=d3c123a03207caaa617346bc7f1d795fa49a6da8 |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11355050648 |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitSource.java
Show resolved
Hide resolved
private List<BigQuerySplit> splits; | ||
private int offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could keep the splits as an iterator to avoid tracking offset explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that complicate getting the next batch? Is there an existing example somewhere how to do that elegantly? I think we've extended the scope of this PR enough.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitSource.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryConnectorModule.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/ForBigQuery.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Outdated
Show resolved
Hide resolved
...in/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageArrowPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryStorageAvroPageSource.java
Outdated
Show resolved
Hide resolved
0894daa
to
3d269a4
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryPageSource.java
Outdated
Show resolved
Hide resolved
@raunaqmorarka or @ebyhr, can you run tests with secrets? @ebyhr would you also like to review this PR? |
ed1eba5
to
4d50d8f
Compare
/test-with-secrets sha=4d50d8fc942292a36afee39b4916582c2a3ec40f |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/11364715101 |
Could you rebase on master to resolve conflicts? |
4d50d8f
to
27efd24
Compare
Description
Avoid blocking in BigQuery split and page source constructors, which execute queries, by starting them in a background thread. This should reduce congestion on busy clusters.
Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: